LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery #418

zacharymorn · 2021-10-29T05:55:10Z

Description

This PR is WIP for implementing basic dynamic pruning support for CombinedFieldsQuery

Tests

Added a randomized test to compare and verify top 100 results match between top_score (with pruning) and complete scoring.

…eldsQuery // HighMed: from office # freq=3224339 freq=225338 // top scores time usage 425 milliseconds // complete time usage 401 milliseconds // HighHigh: but publisher # freq=1456553 freq=1289029 // top scores time usage 469 milliseconds // complete time usage 322 milliseconds // HighLow: with fung # freq=3709421 freq=1344 // top scores time usage 241 milliseconds // complete time usage 428 milliseconds // HighLow: date insult # freq=2020626 freq=4424 // top scores time usage 171 milliseconds // complete time usage 225 milliseconds

Perf tests result Run 1: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 3.53 (3.3%) 2.92 (5.3%) -17.2% ( -25% - -8%) 0.000 PKLookup 108.13 (7.7%) 119.85 (8.2%) 10.8% ( -4% - 28%) 0.000 CFQHighLow 14.88 (3.9%) 16.69 (12.5%) 12.2% ( -3% - 29%) 0.000 CFQHighMed 21.11 (4.1%) 25.87 (11.8%) 22.6% ( 6% - 40%) 0.000 Run 2: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 6.64 (3.1%) 5.63 (10.2%) -15.2% ( -27% - -1%) 0.000 CFQHighLow 8.35 (2.8%) 8.05 (15.0%) -3.6% ( -20% - 14%) 0.297 CFQHighMed 24.51 (5.3%) 24.90 (19.9%) 1.6% ( -22% - 28%) 0.733 PKLookup 110.06 (10.0%) 128.54 (7.9%) 16.8% ( -1% - 38%) 0.000 Run 3: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 13.01 (2.9%) 9.82 (7.8%) -24.5% ( -34% - -14%) 0.000 PKLookup 107.85 (8.1%) 111.79 (11.2%) 3.7% ( -14% - 24%) 0.236 CFQHighHigh 4.83 (2.6%) 5.06 (8.6%) 4.7% ( -6% - 16%) 0.018 CFQHighLow 14.95 (3.0%) 18.31 (19.0%) 22.5% ( 0% - 45%) 0.000 Run 4: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 11.11 (2.9%) 6.69 (4.1%) -39.7% ( -45% - -33%) 0.000 CFQHighLow 27.55 (3.8%) 25.46 (11.0%) -7.6% ( -21% - 7%) 0.003 CFQHighHigh 5.25 (3.2%) 4.96 (6.1%) -5.7% ( -14% - 3%) 0.000 PKLookup 107.61 (6.7%) 121.19 (4.6%) 12.6% ( 1% - 25%) 0.000

zacharymorn · 2021-11-04T06:34:31Z

Perf tests result for commit 2ba435e
Run 1:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHigh 3.53 (3.3%) 2.92 (5.3%) -17.2% ( -25% - -8%) 0.000
PKLookup 108.13 (7.7%) 119.85 (8.2%) 10.8% ( -4% - 28%) 0.000
CFQHighLow 14.88 (3.9%) 16.69 (12.5%) 12.2% ( -3% - 29%) 0.000
CFQHighMed 21.11 (4.1%) 25.87 (11.8%) 22.6% ( 6% - 40%) 0.000

Run 2:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHigh 6.64 (3.1%) 5.63 (10.2%) -15.2% ( -27% - -1%) 0.000
CFQHighLow 8.35 (2.8%) 8.05 (15.0%) -3.6% ( -20% - 14%) 0.297
CFQHighMed 24.51 (5.3%) 24.90 (19.9%) 1.6% ( -22% - 28%) 0.733
PKLookup 110.06 (10.0%) 128.54 (7.9%) 16.8% ( -1% - 38%) 0.000

Run 3:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighMed 13.01 (2.9%) 9.82 (7.8%) -24.5% ( -34% - -14%) 0.000
PKLookup 107.85 (8.1%) 111.79 (11.2%) 3.7% ( -14% - 24%) 0.236
CFQHighHigh 4.83 (2.6%) 5.06 (8.6%) 4.7% ( -6% - 16%) 0.018
CFQHighLow 14.95 (3.0%) 18.31 (19.0%) 22.5% ( 0% - 45%) 0.000

Run 4:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighMed 11.11 (2.9%) 6.69 (4.1%) -39.7% ( -45% - -33%) 0.000
CFQHighLow 27.55 (3.8%) 25.46 (11.0%) -7.6% ( -21% - 7%) 0.003
CFQHighHigh 5.25 (3.2%) 4.96 (6.1%) -5.7% ( -14% - 3%) 0.000
PKLookup 107.61 (6.7%) 121.19 (4.6%) 12.6% ( 1% - 25%) 0.000

Run 1 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHighHigh 4.64 (6.5%) 3.30 (4.7%) -29.0% ( -37% - -19%) 0.000 CFQHighHigh 11.09 (6.0%) 9.61 (6.0%) -13.3% ( -23% - -1%) 0.000 PKLookup 103.38 (4.4%) 108.04 (4.3%) 4.5% ( -4% - 13%) 0.001 CFQHighMedLow 10.58 (6.1%) 12.30 (8.7%) 16.2% ( 1% - 33%) 0.000 CFQHighMed 10.70 (7.4%) 15.51 (11.2%) 44.9% ( 24% - 68%) 0.000 CFQHighLowLow 8.18 (8.2%) 12.87 (11.6%) 57.3% ( 34% - 84%) 0.000 CFQHighLow 14.57 (7.5%) 30.81 (15.1%) 111.4% ( 82% - 144%) 0.000 Run 2 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHighHigh 5.33 (5.7%) 4.02 (7.7%) -24.4% ( -35% - -11%) 0.000 CFQHighLowLow 17.14 (6.2%) 13.06 (5.4%) -23.8% ( -33% - -13%) 0.000 CFQHighMed 17.37 (5.8%) 14.38 (7.7%) -17.2% ( -29% - -3%) 0.000 PKLookup 103.57 (5.5%) 108.84 (5.9%) 5.1% ( -6% - 17%) 0.005 CFQHighMedLow 11.25 (7.2%) 12.70 (9.0%) 12.9% ( -3% - 31%) 0.000 CFQHighHigh 5.00 (6.2%) 7.54 (12.1%) 51.0% ( 30% - 73%) 0.000 CFQHighLow 21.60 (5.2%) 34.57 (14.1%) 60.0% ( 38% - 83%) 0.000 Run 3 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHighHigh 5.40 (6.9%) 4.06 (5.1%) -24.8% ( -34% - -13%) 0.000 CFQHighMedLow 7.64 (7.4%) 5.79 (6.3%) -24.2% ( -35% - -11%) 0.000 CFQHighHigh 11.11 (7.0%) 9.60 (5.9%) -13.6% ( -24% - 0%) 0.000 CFQHighLowLow 21.21 (7.6%) 21.22 (6.6%) 0.0% ( -13% - 15%) 0.993 PKLookup 103.15 (5.9%) 107.60 (6.9%) 4.3% ( -8% - 18%) 0.034 CFQHighLow 21.85 (8.1%) 34.18 (13.5%) 56.4% ( 32% - 84%) 0.000 CFQHighMed 12.07 (8.4%) 19.98 (16.7%) 65.5% ( 37% - 98%) 0.000 Run 4 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 8.50 (5.8%) 6.85 (5.2%) -19.5% ( -28% - -8%) 0.000 CFQHighMedLow 10.89 (5.7%) 8.96 (5.4%) -17.8% ( -27% - -7%) 0.000 CFQHighMed 8.41 (5.8%) 7.74 (5.6%) -7.9% ( -18% - 3%) 0.000 CFQHighHighHigh 3.45 (6.7%) 3.38 (5.3%) -2.0% ( -13% - 10%) 0.287 CFQHighLowLow 7.82 (6.4%) 8.20 (7.5%) 4.8% ( -8% - 20%) 0.030 PKLookup 103.50 (5.0%) 110.69 (5.4%) 6.9% ( -3% - 18%) 0.000 CFQHighLow 11.46 (6.0%) 13.16 (6.7%) 14.8% ( 1% - 29%) 0.000

zacharymorn · 2021-11-05T04:26:48Z

Perf test result from 1a71469

Run 1
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        4.64      (6.5%)        3.30      (4.7%)  -29.0% ( -37% -  -19%) 0.000
                     CFQHighHigh       11.09      (6.0%)        9.61      (6.0%)  -13.3% ( -23% -   -1%) 0.000
                        PKLookup      103.38      (4.4%)      108.04      (4.3%)    4.5% (  -4% -   13%) 0.001
                   CFQHighMedLow       10.58      (6.1%)       12.30      (8.7%)   16.2% (   1% -   33%) 0.000
                      CFQHighMed       10.70      (7.4%)       15.51     (11.2%)   44.9% (  24% -   68%) 0.000
                   CFQHighLowLow        8.18      (8.2%)       12.87     (11.6%)   57.3% (  34% -   84%) 0.000
                      CFQHighLow       14.57      (7.5%)       30.81     (15.1%)  111.4% (  82% -  144%) 0.000

Run 2
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        5.33      (5.7%)        4.02      (7.7%)  -24.4% ( -35% -  -11%) 0.000
                   CFQHighLowLow       17.14      (6.2%)       13.06      (5.4%)  -23.8% ( -33% -  -13%) 0.000
                      CFQHighMed       17.37      (5.8%)       14.38      (7.7%)  -17.2% ( -29% -   -3%) 0.000
                        PKLookup      103.57      (5.5%)      108.84      (5.9%)    5.1% (  -6% -   17%) 0.005
                   CFQHighMedLow       11.25      (7.2%)       12.70      (9.0%)   12.9% (  -3% -   31%) 0.000
                     CFQHighHigh        5.00      (6.2%)        7.54     (12.1%)   51.0% (  30% -   73%) 0.000
                      CFQHighLow       21.60      (5.2%)       34.57     (14.1%)   60.0% (  38% -   83%) 0.000

Run 3
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        5.40      (6.9%)        4.06      (5.1%)  -24.8% ( -34% -  -13%) 0.000
                   CFQHighMedLow        7.64      (7.4%)        5.79      (6.3%)  -24.2% ( -35% -  -11%) 0.000
                     CFQHighHigh       11.11      (7.0%)        9.60      (5.9%)  -13.6% ( -24% -    0%) 0.000
                   CFQHighLowLow       21.21      (7.6%)       21.22      (6.6%)    0.0% ( -13% -   15%) 0.993
                        PKLookup      103.15      (5.9%)      107.60      (6.9%)    4.3% (  -8% -   18%) 0.034
                      CFQHighLow       21.85      (8.1%)       34.18     (13.5%)   56.4% (  32% -   84%) 0.000
                      CFQHighMed       12.07      (8.4%)       19.98     (16.7%)   65.5% (  37% -   98%) 0.000

Run 4
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CFQHighHigh        8.50      (5.8%)        6.85      (5.2%)  -19.5% ( -28% -   -8%) 0.000
                   CFQHighMedLow       10.89      (5.7%)        8.96      (5.4%)  -17.8% ( -27% -   -7%) 0.000
                      CFQHighMed        8.41      (5.8%)        7.74      (5.6%)   -7.9% ( -18% -    3%) 0.000
                 CFQHighHighHigh        3.45      (6.7%)        3.38      (5.3%)   -2.0% ( -13% -   10%) 0.287
                   CFQHighLowLow        7.82      (6.4%)        8.20      (7.5%)    4.8% (  -8% -   20%) 0.030
                        PKLookup      103.50      (5.0%)      110.69      (5.4%)    6.9% (  -3% -   18%) 0.000
                      CFQHighLow       11.46      (6.0%)       13.16      (6.7%)   14.8% (   1% -   29%) 0.000

Run 1 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 16.99 (4.9%) 16.84 (14.9%) -0.9% ( -19% - 19%) 0.790 PKLookup 98.49 (8.7%) 121.99 (9.1%) 23.9% ( 5% - 45%) 0.000 CFQHighHigh 5.82 (4.7%) 7.55 (10.0%) 29.8% ( 14% - 46%) 0.000 CFQHighLow 17.59 (5.2%) 25.01 (17.3%) 42.2% ( 18% - 68%) 0.000 Run 2 TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 18.28 (6.1%) 15.63 (15.6%) -14.5% ( -34% - 7%) 0.000 PKLookup 101.26 (10.0%) 105.39 (12.0%) 4.1% ( -16% - 28%) 0.243 CFQHighHigh 9.96 (5.6%) 10.47 (13.0%) 5.2% ( -12% - 25%) 0.101 CFQHighLow 9.71 (5.8%) 15.71 (34.0%) 61.8% ( 20% - 107%) 0.000

zacharymorn · 2021-11-12T06:39:10Z

I re-ran perf test after mikemccand/luceneutil@0550148 has been merged:

Results from python3 src/python/localrun.py -source combinedFieldsBig:

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.69      (1.8%)        2.49      (6.2%)  -32.5% ( -39% -  -24%) 0.000
      CFQHighMed        4.95      (2.1%)        4.19      (5.9%)  -15.5% ( -22% -   -7%) 0.000
        PKLookup      125.72      (4.5%)      126.86     (10.3%)    0.9% ( -13% -   16%) 0.719
      CFQHighLow       19.92      (2.2%)       20.80      (9.5%)    4.4% (  -7% -   16%) 0.043

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.61      (2.8%)        2.48      (2.9%)  -31.4% ( -36% -  -26%) 0.000
        PKLookup      116.67      (7.1%)      123.97      (5.5%)    6.3% (  -5% -   20%) 0.002
      CFQHighMed        4.97      (3.6%)        5.29      (5.5%)    6.6% (  -2% -   16%) 0.000
      CFQHighLow       11.96      (4.5%)       13.99      (6.5%)   17.0% (   5% -   29%) 0.000

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.51      (4.2%)        2.44      (6.5%)  -30.5% ( -39% -  -20%) 0.000
        PKLookup      105.72     (11.9%)      108.81     (11.2%)    2.9% ( -18% -   29%) 0.424
      CFQHighMed       10.85      (4.2%)       11.60     (11.4%)    6.9% (  -8% -   23%) 0.011
      CFQHighLow       15.11      (5.6%)       16.16      (9.8%)    7.0% (  -7% -   23%) 0.006

Results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup       93.42     (13.7%)       88.23     (11.7%)   -5.6% ( -27% -   23%) 0.168
      CFQHighMed        4.69     (10.7%)        5.00     (18.0%)    6.6% ( -20% -   39%) 0.160
     CFQHighHigh        4.51     (10.6%)        5.17     (17.7%)   14.6% ( -12% -   48%) 0.002
      CFQHighLow       14.13      (8.5%)       23.11     (32.3%)   63.5% (  20% -  114%) 0.000

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
      CFQHighMed        4.77      (4.5%)        4.10      (8.3%)  -14.2% ( -25% -   -1%) 0.000
        PKLookup       98.99     (12.3%)      101.47     (12.5%)    2.5% ( -19% -   31%) 0.522
     CFQHighHigh        4.88      (5.3%)        5.98     (11.5%)   22.6% (   5% -   41%) 0.000
      CFQHighLow       11.57      (5.6%)       18.86     (18.8%)   62.9% (  36% -   92%) 0.000

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.55      (5.1%)        2.38      (9.0%)  -32.9% ( -44% -  -19%) 0.000
        PKLookup      101.29      (7.0%)       94.22     (15.4%)   -7.0% ( -27% -   16%) 0.065
      CFQHighLow       15.43      (5.8%)       16.60     (11.2%)    7.6% (  -8% -   26%) 0.007
      CFQHighMed        3.12      (5.1%)        3.83     (15.0%)   22.7% (   2% -   45%) 0.000

For one of the most negatively impacted query (-42.0%): CFQHighHigh: at united +combinedFields=titleTokenized^4.0,body^2.0 # freq=2834104 freq=1185528, the JFR CPU profiling result looks like the following

PERCENT       CPU SAMPLES   STACK
15.82%        13099         org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#doMergeImpactsPerField()
11.46%        9487          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
4.69%         3883          org.apache.lucene.search.DisiPriorityQueue#downHeap()
3.66%         3027          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
1.93%         1598          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
1.92%         1590          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
1.92%         1588          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
1.77%         1467          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
1.68%         1392          org.apache.lucene.search.DisiPriorityQueue#top()
1.60%         1326          org.apache.lucene.search.DisiPriorityQueue#topList()
1.54%         1276          org.apache.lucene.util.PriorityQueue#downHeap()
1.50%         1243          java.lang.Math#round()
1.45%         1201          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.38%         1145          java.util.HashMap#resize()
1.35%         1115          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.33%         1100          org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#mergeImpactsPerField()
1.21%         1001          java.util.HashMap$HashIterator#nextNode()
1.17%         972           java.util.LinkedList#linkLast()
1.13%         934           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.07%         883           org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
1.06%         878           org.apache.lucene.store.ByteBufferGuard#getByte()
1.02%         841           org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()

suggesting quite some CPU time is spent on merging impacts within the same field. I'm suspecting this may occur when the max score is being computed too frequently, as frequent term's skip list would be "dense" and is also used to determine upTo for max score:

lucene/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java

Lines 78 to 82 in 2a9adb8

    
           public int advanceShallow(int target) throws IOException { 
        
             impactsSource.advanceShallow(target); 
        
             Impacts impacts = impactsSource.getImpacts(); 
        
             return impacts.getDocIdUpTo(0); 
        
           }

zacharymorn · 2021-11-13T06:21:59Z

Hi @jpountz @jimczi, when I was doing some more deep dive to see how this PR can be improved further, I noticed a difference of field-weight parameter passed to createMultiNormsLeafSimScorer in CombinedFieldQuery:

CombinedFieldQuery#scorer:

lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

Lines 420 to 421 in 3b914a4

    
           MultiNormsLeafSimScorer scoringSimScorer = 
        
               new MultiNormsLeafSimScorer(simWeight, context.reader(), fields, true);

CombinedFieldQuery#explain:

lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

Lines 387 to 389 in 3b914a4

    
           MultiNormsLeafSimScorer docScorer = 
        
               new MultiNormsLeafSimScorer( 
        
                   simWeight, context.reader(), fieldAndWeights.values(), true);

For the CombinedFieldQuery#scorer one, fields may contain duplicated fields:

lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

Lines 404 to 414 in 3b914a4

    
           List<FieldAndWeight> fields = new ArrayList<>(); 
        
           for (int i = 0; i < fieldTerms.length; i++) { 
        
             TermState state = termStates[i].get(context); 
        
             if (state != null) { 
        
               TermsEnum termsEnum = context.reader().terms(fieldTerms[i].field()).iterator(); 
        
               termsEnum.seekExact(fieldTerms[i].bytes(), state); 
        
               PostingsEnum postingsEnum = termsEnum.postings(null, PostingsEnum.FREQS); 
        
               iterators.add(postingsEnum); 
        
               fields.add(fieldAndWeights.get(fieldTerms[i].field())); 
        
             } 
        
           }

, whereas fieldAndWeights.values() should not contain duplicated fields.

I feel CombinedFieldQuery#scorer should be updated to use fieldAndWeights.values() as well? What do you think?

zacharymorn · 2021-11-16T04:24:40Z

Hi @jpountz @jimczi, when I was doing some more deep dive to see how this PR can be improved further, I noticed a difference of field-weight parameter passed to createMultiNormsLeafSimScorer in CombinedFieldQuery:

CombinedFieldQuery#scorer:

lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

Lines 420 to 421 in 3b914a4

MultiNormsLeafSimScorer scoringSimScorer =

new MultiNormsLeafSimScorer(simWeight, context.reader(), fields, true);

CombinedFieldQuery#explain:

lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

Lines 387 to 389 in 3b914a4

MultiNormsLeafSimScorer docScorer =

new MultiNormsLeafSimScorer(

simWeight, context.reader(), fieldAndWeights.values(), true);

For the CombinedFieldQuery#scorer one, fields may contain duplicated fields:

lucene/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

Lines 404 to 414 in 3b914a4

List<FieldAndWeight> fields = new ArrayList<>();

for (int i = 0; i < fieldTerms.length; i++) {

TermState state = termStates[i].get(context);

if (state != null) {

TermsEnum termsEnum = context.reader().terms(fieldTerms[i].field()).iterator();

termsEnum.seekExact(fieldTerms[i].bytes(), state);

PostingsEnum postingsEnum = termsEnum.postings(null, PostingsEnum.FREQS);

iterators.add(postingsEnum);

fields.add(fieldAndWeights.get(fieldTerms[i].field()));

}

}

, whereas fieldAndWeights.values() should not contain duplicated fields.

I feel CombinedFieldQuery#scorer should be updated to use fieldAndWeights.values() as well? What do you think?

For now I have created a spin-off issue https://issues.apache.org/jira/browse/LUCENE-10236 and a separate PR #444 for this. Please let me know if they look good to you.

jpountz · 2021-11-16T08:21:12Z

lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

+            }
+
+            return result;
+          }


I'm suspecting this may occur when the max score is being computed too frequently, as frequent term's skip list would be "dense" and is also used to determine upTo for max score

That would make sense to me. Maybe update this class so that numLevels / getDocIdUpTo only use the impacts of the term that has the highest weight?

Thanks for the suggestion! I assume by "highest weight" here, you meant term that has lower doc frequencies, as opposed to field weight?

I also did a quick test with the following updated numLevels / getDocIdupTo implementations to approximate using lower doc frequencies term's impact

@Override public int numLevels() { // this is changed from Integer.MIN_VALUE int result = Integer.MAX_VALUE; // this is changed from Math.max for (Impacts impacts : leadingImpactsPerField.values()) { result = Math.min(result, impacts.numLevels()); } return result; } @Override public int getDocIdUpTo(int level) { // this is changed from Integer.MAX_VALUE int result = Integer.MIN_VALUE; for (Impacts impacts : leadingImpactsPerField.values()) { if (impacts.numLevels() > level) { // this is changed from Math.min result = Math.max(result, impacts.getDocIdUpTo(level)); } } return result; }

For the slow query CFQHighHigh: at united +combinedFields=titleTokenized^4.0,body^2.0 # freq=2834104 freq=1185528 above, it did improve the from -42% to -30% ~ -35%, with the following JFR CPU result:

PERCENT CPU SAMPLES STACK 19.41% 11866 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() 7.35% 4491 org.apache.lucene.search.DisiPriorityQueue#downHeap() 6.07% 3713 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 3.59% 2193 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 3.50% 2143 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq() 3.24% 1983 org.apache.lucene.search.DisjunctionDISIApproximation#advance() 3.09% 1889 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq() 2.87% 1752 org.apache.lucene.search.DisiPriorityQueue#top() 2.75% 1681 java.lang.Math#round() 2.71% 1657 org.apache.lucene.search.DisiPriorityQueue#topList() 2.54% 1555 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() 2.36% 1441 org.apache.lucene.store.ByteBufferGuard#ensureValid() 2.11% 1292 org.apache.lucene.util.SmallFloat#longToInt4() 1.87% 1142 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score() 1.73% 1058 org.apache.lucene.search.DisiPriorityQueue#updateTop() 1.71% 1045 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue() 1.68% 1030 org.apache.lucene.store.ByteBufferGuard#getByte() 1.59% 970 jdk.internal.misc.Unsafe#getByte() 1.57% 959 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater() 1.55% 948 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance() 1.18% 720 org.apache.lucene.search.ImpactsDISI#docID() 0.84% 515 org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#doMergeImpactsPerField() 0.79% 485 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 0.76% 462 org.apache.lucene.search.DisiPriorityQueue#prepend() 0.73% 444 java.lang.Math#toIntExact() 0.60% 367 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq() 0.53% 324 org.apache.lucene.search.ImpactsDISI#advanceTarget() 0.51% 314 org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo() 0.50% 306 org.apache.lucene.util.SmallFloat#intToByte4() 0.49% 299 org.apache.lucene.search.ImpactsDISI#nextDoc()

This CPU profiling result looks very similar to that of baseline. I'll do more testings to understand why.

Hi @jpountz , I just noticed a place in the implementation that was called multiple times, and thus tried to "cache" the result to save the repeated computation 6d5e780. It turned out the improvement was quite obvious:

Results from python3 src/python/localrun.py -source combinedFieldsBig

Run 1

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 3.17 (3.5%) 2.70 (4.6%) -14.8% ( -22% - -6%) 0.000 PKLookup 117.15 (6.7%) 116.31 (10.3%) -0.7% ( -16% - 17%) 0.792 CFQHighMed 12.98 (5.2%) 18.26 (13.9%) 40.7% ( 20% - 63%) 0.000 CFQHighLow 17.30 (3.6%) 36.19 (20.1%) 109.2% ( 82% - 137%) 0.000

Run 2

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 104.98 (6.9%) 115.76 (8.5%) 10.3% ( -4% - 27%) 0.000 CFQHighMed 5.73 (3.9%) 6.53 (8.4%) 14.0% ( 1% - 27%) 0.000 CFQHighLow 17.87 (3.4%) 20.37 (8.1%) 14.0% ( 2% - 26%) 0.000 CFQHighHigh 5.41 (4.2%) 6.53 (8.9%) 20.7% ( 7% - 35%) 0.000

Run 3

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 3.18 (3.6%) 2.66 (5.3%) -16.4% ( -24% - -7%) 0.000 CFQHighMed 5.75 (4.0%) 5.07 (5.5%) -11.7% ( -20% - -2%) 0.000 PKLookup 136.87 (5.9%) 133.97 (9.4%) -2.1% ( -16% - 14%) 0.393 CFQHighLow 23.19 (2.9%) 38.74 (16.0%) 67.0% ( 46% - 88%) 0.000

Results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig

Run 1

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 102.30 (8.5%) 101.05 (15.1%) -1.2% ( -22% - 24%) 0.753 CFQHighMed 5.73 (6.7%) 6.61 (12.5%) 15.5% ( -3% - 37%) 0.000 CFQHighHigh 5.37 (6.2%) 7.28 (12.7%) 35.6% ( 15% - 58%) 0.000 CFQHighLow 13.61 (6.8%) 27.02 (34.4%) 98.5% ( 53% - 150%) 0.000

Run 2

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 5.63 (4.0%) 4.87 (6.6%) -13.5% ( -23% - -2%) 0.000 PKLookup 95.27 (9.3%) 102.84 (9.0%) 7.9% ( -9% - 28%) 0.006 CFQHighHigh 5.45 (3.7%) 7.43 (10.6%) 36.3% ( 21% - 52%) 0.000 CFQHighLow 13.91 (4.6%) 29.42 (28.2%) 111.5% ( 75% - 151%) 0.000

Run 3

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 74.67 (39.5%) 97.84 (23.3%) 31.0% ( -22% - 155%) 0.002 CFQHighHigh 5.61 (4.5%) 7.62 (10.0%) 35.8% ( 20% - 52%) 0.000 CFQHighMed 10.44 (3.5%) 15.65 (23.8%) 50.0% ( 21% - 80%) 0.000 CFQHighLow 14.00 (4.2%) 25.99 (33.0%) 85.6% ( 46% - 128%) 0.000

However, this optimization doesn't help much for query CFQHighHigh: at united +combinedFields=titleTokenized^4.0,body^2.0 # freq=2834104 freq=1185528, as the dense skip list for at still cause frequent max score computation (although giving the field titleTokenized more weight does indeed reduce perform hit from -42.0% to -34.3%). I tried to play with some different numLevels and getDocIdUpTo implementations, but so far I find it hard to balance the following two:

lower docIdUpTo -> lower max score -> more frequent max score computation, but also more effective pruning due to lower max score

higher docIdUpTo -> higher max score -> less frequent max score computation, but also less effective in pruning due to higher max score

Thanks for running these experiments, the performance does look much better now indeed! I would feel good about merging something that has these performance characteristics.

I assume by "highest weight" here, you meant term that has lower doc frequencies, as opposed to field weight?

I meant field weight. Since we're using impacts from the field that has the highest weight and approximating impacts from other fields, my assumption was that it would help get better max score approximations.

That said, I expect that it would ofter have the same result as picking the field that has the lowest doc freq since fields that have higher weight tend to have fewer terms per doc, so doc freqs are generally lower?

result = Math.max(result, impacts.getDocIdUpTo(level));

When it was Math.min, it hurt us because we would have to recompute score boundaries often. But with Math.max we are forcing all fields but one to return impacts on the next level. My gut feeling is that returning the getDocIdUpTo of the field that has the highest weight (or lowest doc freq as you suggested) would perform better.

Thanks @jpountz for the confirmation and clarification! I gave these ideas a try in this commit db2446f, with the following two similar approaches:

Use the impacts of a) highest weight field + b) lowest doc freq term within that field (commented out)

Use the impacts of a) highest weight field + b) lowest getDocIdUpTo term within that field

Here are the results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig

Run 1:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 5.46 (7.7%) 4.58 (7.1%) -16.1% ( -28% - -1%) 0.000 CFQHighLow 21.83 (7.7%) 18.86 (7.2%) -13.6% ( -26% - 1%) 0.000 CFQHighHigh 3.44 (7.4%) 3.06 (7.4%) -11.1% ( -24% - 4%) 0.000 PKLookup 109.63 (11.8%) 109.67 (13.5%) 0.0% ( -22% - 28%) 0.993

Run 2:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighLow 17.22 (10.2%) 11.73 (12.0%) -31.9% ( -49% - -10%) 0.000 CFQHighMed 10.20 (10.5%) 8.36 (9.3%) -18.1% ( -34% - 1%) 0.000 CFQHighHigh 3.65 (6.6%) 3.11 (5.3%) -14.8% ( -25% - -3%) 0.000 PKLookup 91.80 (26.0%) 100.13 (16.3%) 9.1% ( -26% - 69%) 0.187

Run 3:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 5.62 (8.1%) 4.86 (8.0%) -13.5% ( -27% - 2%) 0.000 CFQHighHigh 4.13 (6.9%) 3.59 (5.8%) -13.0% ( -24% - 0%) 0.000 CFQHighLow 17.00 (8.2%) 15.06 (9.7%) -11.4% ( -27% - 7%) 0.000 PKLookup 99.61 (6.4%) 102.96 (6.7%) 3.4% ( -9% - 17%) 0.104

and their CPU profiling looks very similar to that of baseline, suggesting that pruning was not very effective there.

Interestingly, when I flipped the weights assigned to body and titleTokenized fields in the task (assigned 20.0 to body field, and 2.0 to titleTokenized field), I got the following result:

Run 1:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 99.64 (9.1%) 94.80 (11.6%) -4.9% ( -23% - 17%) 0.142 CFQHighLow 17.31 (7.2%) 19.16 (15.0%) 10.7% ( -10% - 35%) 0.004 CFQHighMed 3.41 (6.4%) 4.57 (13.6%) 34.2% ( 13% - 57%) 0.000 CFQHighHigh 3.28 (6.3%) 5.32 (16.8%) 62.2% ( 36% - 91%) 0.000

Run 2:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 4.10 (6.3%) 2.72 (5.8%) -33.6% ( -42% - -23%) 0.000 CFQHighMed 5.69 (5.6%) 4.48 (8.2%) -21.4% ( -33% - -8%) 0.000 PKLookup 104.53 (21.1%) 104.06 (16.8%) -0.5% ( -31% - 47%) 0.940 CFQHighLow 22.52 (7.4%) 35.31 (21.3%) 56.8% ( 26% - 92%) 0.000

Run 3:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 80.39 (31.2%) 93.55 (19.4%) 16.4% ( -26% - 97%) 0.046 CFQHighHigh 5.71 (6.6%) 6.96 (13.4%) 21.7% ( 1% - 44%) 0.000 CFQHighMed 10.21 (5.6%) 12.62 (22.0%) 23.6% ( -3% - 54%) 0.000 CFQHighLow 16.89 (4.2%) 27.39 (22.9%) 62.1% ( 33% - 93%) 0.000

I think what might have happened here is, as highest weighted field / lowest doc freq term tends to have fewer terms per doc, by definition their matches are also more sparse in the corpus, compared with field with lower weight (i.e. there are more docs with a specific term showing up in its body field, compared with titleTokenized field). Hence when returning getDocIdUpTo of the field that has the highest weight, a larger bound will be returned (probably close to Math.max?), causing max score to be much higher and less effective in pruning.

What do you think of the commit and results? Is there anything I may have missed?

I re-ran the perf tests combinedFieldsUnevenlyWeightedBig after bug fix 3d0a215, and the results look similar:

With commit test leadImpact from highest weighted field db2446f

Run 1:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 2.81 (10.9%) 2.40 (10.3%) -14.6% ( -32% - 7%) 0.000 CFQHighMed 3.27 (11.8%) 2.91 (11.0%) -11.2% ( -30% - 13%) 0.002 CFQHighLow 16.09 (14.6%) 14.58 (11.9%) -9.4% ( -31% - 20%) 0.025 PKLookup 92.08 (11.6%) 96.23 (13.3%) 4.5% ( -18% - 33%) 0.255

Run 2:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighLow 15.76 (12.0%) 10.19 (12.9%) -35.3% ( -53% - -11%) 0.000 CFQHighMed 12.01 (13.3%) 8.79 (13.9%) -26.9% ( -47% - 0%) 0.000 CFQHighHigh 5.59 (10.8%) 4.40 (8.4%) -21.2% ( -36% - -2%) 0.000 PKLookup 93.36 (21.9%) 94.29 (22.6%) 1.0% ( -35% - 58%) 0.888

Without commit test leadImpact from highest weighted field db2446f

Run 1:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighLow 16.69 (7.9%) 15.02 (14.1%) -10.0% ( -29% - 13%) 0.006 PKLookup 94.38 (9.7%) 105.59 (13.3%) 11.9% ( -10% - 38%) 0.001 CFQHighHigh 5.51 (7.0%) 7.59 (13.5%) 37.6% ( 15% - 62%) 0.000 CFQHighMed 10.24 (7.2%) 15.23 (20.0%) 48.8% ( 20% - 81%) 0.000

Run 2:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 94.85 (11.6%) 92.29 (11.0%) -2.7% ( -22% - 22%) 0.448 CFQHighHigh 5.26 (13.7%) 6.25 (23.4%) 18.9% ( -16% - 64%) 0.002 CFQHighMed 3.39 (16.1%) 4.55 (29.7%) 34.3% ( -9% - 95%) 0.000 CFQHighLow 16.04 (14.9%) 29.22 (39.6%) 82.2% ( 24% - 160%) 0.000

Run 3:

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 68.96 (40.4%) 67.26 (40.1%) -2.5% ( -59% - 130%) 0.847 CFQHighHigh 4.85 (16.7%) 5.62 (33.8%) 15.7% ( -29% - 79%) 0.062 CFQHighMed 10.67 (20.1%) 12.47 (42.7%) 16.9% ( -38% - 99%) 0.110 CFQHighLow 14.07 (22.6%) 23.18 (68.7%) 64.8% ( -21% - 201%) 0.000

I guess the scoring function has taken care of / bounded the erroneous norm used in approximation earlier as it was effectively Long.MIN_VALUE.

…port

Results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig: Run 1: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 2.81 (10.9%) 2.40 (10.3%) -14.6% ( -32% - 7%) 0.000 CFQHighMed 3.27 (11.8%) 2.91 (11.0%) -11.2% ( -30% - 13%) 0.002 CFQHighLow 16.09 (14.6%) 14.58 (11.9%) -9.4% ( -31% - 20%) 0.025 PKLookup 92.08 (11.6%) 96.23 (13.3%) 4.5% ( -18% - 33%) 0.255 Run 2: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighLow 15.76 (12.0%) 10.19 (12.9%) -35.3% ( -53% - -11%) 0.000 CFQHighMed 12.01 (13.3%) 8.79 (13.9%) -26.9% ( -47% - 0%) 0.000 CFQHighHigh 5.59 (10.8%) 4.40 (8.4%) -21.2% ( -36% - -2%) 0.000 PKLookup 93.36 (21.9%) 94.29 (22.6%) 1.0% ( -35% - 58%) 0.888

jpountz

Thanks for making progress on this and sorry for the slow review cycles. I just had a quick look and it seems that some computations can be move up-front such as deciding what field we use as a lead since field weights do not change during query execution?

I wonder if there's things we can do to make the code simpler, such as sharing the logic to combine multiple terms together within the same field with SynonymQuery.

jpountz · 2021-11-30T14:30:49Z

lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

@@ -402,14 +423,30 @@ public Explanation explain(LeafReaderContext context, int doc) throws IOExceptio
    public Scorer scorer(LeafReaderContext context) throws IOException {
      List<PostingsEnum> iterators = new ArrayList<>();
      List<FieldAndWeight> fields = new ArrayList<>();
+      Map<String, List<ImpactsEnum>> fieldImpactsEnum = new HashMap<>(fieldAndWeights.size());
+      Map<String, List<Integer>> fieldTermDocFreq = new HashMap<>(fieldAndWeights.size());


Do we actually nee this list of doc freqs? They would be equal to impactsEnum#cost all the time?

Oh I was actually not aware that ImpactsEnum#cost returns doc freqs directly. Will remove it later.

jpountz · 2021-11-30T14:37:55Z

lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java

+              maxWeight = weight;
+              maxWeightField = field;
+            }
+          }


Since field weights do not change over time, could we compute the field that has the higest weight up-front instead of doing it every time getImpacts is called?

zacharymorn · 2021-12-01T06:11:51Z

Thanks for making progress on this and sorry for the slow review cycles.

No worry and thanks for the review and feedback!

I just had a quick look and it seems that some computations can be move up-front such as deciding what field we use as a lead since field weights do not change during query execution?

Yup I can move it out to avoid repeated computation there. Just a quick check though, from the perf results so far (#418 (comment) & #418 (comment)), it seems using the most weighted field to decide on the leading impacts may not yield the best result, as matches in most weighted field may also give a high docId upTo bound. If we were to drop that approach and revert the commit db2446f that introduced it, then it won't be computing most weighted field at all. Are you trying to see if perf tests can be further improved if most weighted field only gets computed once?

I wonder if there's things we can do to make the code simpler, such as sharing the logic to combine multiple terms together within the same field with SynonymQuery.

Yeah I have the same thought there. I wasn't sure earlier as SynonymQuery and CombinedFieldsQuery sit in different modules / packages, so extracting common code out may potentially require a new public specialized API / class (in the core module ?). Let me give that a try and see how it looks!

jpountz · 2021-12-01T18:04:52Z

from the perf results so far, it seems using the most weighted field to decide on the leading impacts may not yield the best result
Are you trying to see if perf tests can be further improved if most weighted field only gets computed once?

Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps?

wasn't sure earlier as SynonymQuery and CombinedFieldsQuery sit in different modules / packages, so extracting common code out may potentially require a new public specialized API / class (in the core module ?). Let me give that a try and see how it looks!

Maybe we could have a SynonymImpactsSource or something like that as a public @lucene.internal class in core that would provide this logic.

… replace doc freq collection with ImpactsEnum#cost

…ield

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighLow 21.45 (13.8%) 15.65 (18.0%) -27.1% ( -51% - 5%) 0.000 CFQHighMed 5.92 (4.2%) 4.52 (12.2%) -23.7% ( -38% - -7%) 0.000 CFQHighHigh 5.52 (5.0%) 4.45 (12.1%) -19.4% ( -34% - -2%) 0.000 PKLookup 110.45 (11.1%) 108.08 (15.7%) -2.1% ( -26% - 27%) 0.617 = Candidate CPU JFR PROFILE SUMMARY from 86268 events (total: 86268) tests.profile.mode=cpu tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 15.48% 13355 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() 7.37% 6359 org.apache.lucene.search.DisjunctionDISIApproximation#advance() 7.35% 6345 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 6.38% 5508 org.apache.lucene.search.DisiPriorityQueue#downHeap() 4.18% 3605 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq() 4.10% 3538 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 4.01% 3456 org.apache.lucene.search.DisiPriorityQueue#topList() 3.67% 3169 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq() 3.36% 2899 org.apache.lucene.search.DisiPriorityQueue#top() 3.08% 2660 java.lang.Math#round() 2.40% 2067 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue() 2.38% 2057 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score() 1.94% 1675 org.apache.lucene.util.SmallFloat#longToInt4() 1.87% 1617 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() 1.86% 1605 org.apache.lucene.store.ByteBufferGuard#ensureValid() 1.80% 1557 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater() 1.50% 1293 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance() 1.39% 1200 org.apache.lucene.search.ImpactsDISI#docID() 1.33% 1145 org.apache.lucene.search.DisiPriorityQueue#updateTop() 1.24% 1067 jdk.internal.misc.Unsafe#getByte() 1.18% 1019 org.apache.lucene.store.ByteBufferGuard#getByte() 0.92% 792 org.apache.lucene.util.SmallFloat#intToByte4() 0.76% 659 java.lang.Math#toIntExact() 0.68% 584 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq() 0.67% 579 org.apache.lucene.search.ImpactsDISI#nextDoc() 0.61% 530 org.apache.lucene.search.ImpactsDISI#advanceTarget() 0.58% 500 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 0.57% 492 org.apache.lucene.search.DisiPriorityQueue#prepend() 0.40% 345 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32() 0.39% 333 org.apache.lucene.codecs.lucene90.ForUtil#expand8() = Baseline CPU JFR PROFILE SUMMARY from 76683 events (total: 76683) tests.profile.mode=cpu tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 16.26% 12468 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() 8.46% 6490 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 8.24% 6315 org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc() 6.13% 4698 org.apache.lucene.search.DisiPriorityQueue#downHeap() 4.79% 3676 org.apache.lucene.search.DisiPriorityQueue#topList() 4.45% 3412 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 4.32% 3314 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq() 3.25% 2492 java.lang.Math#round() 3.01% 2309 org.apache.lucene.search.DisiPriorityQueue#top() 2.75% 2107 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score() 2.62% 2010 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq() 2.29% 1753 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 2.12% 1623 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue() 2.05% 1569 org.apache.lucene.util.SmallFloat#longToInt4() 1.93% 1481 org.apache.lucene.store.ByteBufferGuard#ensureValid() 1.77% 1358 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() 1.45% 1110 org.apache.lucene.search.DisiPriorityQueue#updateTop() 1.25% 958 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc() 1.19% 914 org.apache.lucene.store.ByteBufferGuard#getByte() 1.17% 898 java.lang.Math#toIntExact() 1.06% 815 org.apache.lucene.util.SmallFloat#intToByte4() 1.02% 782 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq() 0.86% 656 jdk.internal.misc.Unsafe#getByte() 0.68% 518 org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum() 0.64% 492 org.apache.lucene.search.DisiPriorityQueue#prepend() 0.49% 375 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32() 0.42% 322 org.apache.lucene.codecs.lucene90.ForUtil#expand8() 0.31% 239 java.util.zip.Inflater#inflateBytesBytes() 0.30% 228 org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#seekExact() 0.29% 224 org.apache.lucene.codecs.lucene90.PForUtil#expand32()

TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 3.68 (6.5%) 2.66 (7.3%) -27.8% ( -39% - -14%) 0.000 CFQHighLow 14.19 (7.9%) 11.31 (8.0%) -20.3% ( -33% - -4%) 0.000 CFQHighMed 12.92 (9.5%) 10.41 (9.9%) -19.4% ( -35% - 0%) 0.000 PKLookup 104.92 (6.7%) 104.49 (9.0%) -0.4% ( -15% - 16%) 0.869 = Candidate CPU JFR PROFILE SUMMARY from 87201 events (total: 87201) tests.profile.mode=cpu tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 15.13% 13191 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() 7.42% 6466 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 7.29% 6354 org.apache.lucene.search.DisjunctionDISIApproximation#advance() 6.79% 5921 org.apache.lucene.search.DisiPriorityQueue#downHeap() 4.34% 3785 org.apache.lucene.search.DisiPriorityQueue#topList() 3.97% 3465 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 3.85% 3360 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq() 3.18% 2777 java.lang.Math#round() 3.01% 2624 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq() 3.01% 2624 org.apache.lucene.search.DisiPriorityQueue#top() 2.33% 2036 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score() 2.20% 1917 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue() 2.07% 1809 org.apache.lucene.util.SmallFloat#longToInt4() 1.81% 1580 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() 1.78% 1555 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance() 1.77% 1545 org.apache.lucene.store.ByteBufferGuard#ensureValid() 1.51% 1316 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater() 1.39% 1210 org.apache.lucene.search.DisiPriorityQueue#updateTop() 1.29% 1124 org.apache.lucene.search.ImpactsDISI#docID() 1.22% 1063 jdk.internal.misc.Unsafe#getByte() 1.19% 1037 org.apache.lucene.store.ByteBufferGuard#getByte() 1.08% 938 org.apache.lucene.search.DisiPriorityQueue#prepend() 0.97% 849 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 0.86% 750 org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo() 0.82% 717 org.apache.lucene.util.SmallFloat#intToByte4() 0.78% 683 java.lang.Math#toIntExact() 0.62% 540 org.apache.lucene.codecs.lucene90.PForUtil#decode() 0.61% 529 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq() 0.51% 441 org.apache.lucene.search.ImpactsDISI#nextDoc() 0.50% 435 org.apache.lucene.search.ImpactsDISI#advanceTarget() = Baseline CPU JFR PROFILE SUMMARY from 81578 events (total: 81578) tests.profile.mode=cpu tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 16.38% 13363 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() 8.67% 7073 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 8.05% 6569 org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc() 6.50% 5305 org.apache.lucene.search.DisiPriorityQueue#downHeap() 5.17% 4220 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq() 4.40% 3587 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 4.30% 3505 org.apache.lucene.search.DisiPriorityQueue#topList() 3.37% 2751 java.lang.Math#round() 2.77% 2263 org.apache.lucene.search.DisiPriorityQueue#top() 2.64% 2156 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq() 2.62% 2135 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score() 2.12% 1729 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue() 2.11% 1723 org.apache.lucene.util.SmallFloat#longToInt4() 2.06% 1678 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 2.01% 1640 org.apache.lucene.store.ByteBufferGuard#ensureValid() 1.77% 1442 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() 1.62% 1325 org.apache.lucene.search.DisiPriorityQueue#updateTop() 1.26% 1028 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc() 1.10% 899 java.lang.Math#toIntExact() 1.10% 898 org.apache.lucene.util.SmallFloat#intToByte4() 1.09% 888 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq() 1.06% 864 org.apache.lucene.store.ByteBufferGuard#getByte() 0.80% 652 jdk.internal.misc.Unsafe#getByte() 0.74% 605 org.apache.lucene.codecs.lucene90.PForUtil#decode() 0.68% 552 org.apache.lucene.search.DisiPriorityQueue#prepend() 0.60% 492 org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum() 0.55% 451 java.io.RandomAccessFile#readBytes() 0.38% 314 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32() 0.38% 307 org.apache.lucene.codecs.lucene90.ForUtil#expand8() 0.32% 260 org.apache.lucene.codecs.lucene90.PForUtil#expand32()

zacharymorn · 2021-12-02T08:22:19Z

Maybe we could have a SynonymImpactsSource or something like that as a public @lucene.internal class in core that would provide this logic.

Ah yes using @lucene.internal makes sense! I have made a new class to extract out the impacts merging within same field logic in 0a9bdcc, and use that in CombinedFieldsQuery 808fec2. I've also moved up the repeated max weight field computation in 8a7ea99 .

zacharymorn · 2021-12-02T08:35:41Z

Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps?

I see. The latest two commits 808fec2 & 75c5b04 used max weight field to drive scoring, and overall they had around -20% impacts to tasks from combinedFieldsUnevenlyWeightedBig so far. A sample CPU JFR results between candidate and baseline look like the following:

Candidate CPU JFR:

PERCENT       CPU SAMPLES   STACK
15.13%        13191         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
7.42%         6466          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
7.29%         6354          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
6.79%         5921          org.apache.lucene.search.DisiPriorityQueue#downHeap()
4.34%         3785          org.apache.lucene.search.DisiPriorityQueue#topList()
3.97%         3465          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
3.85%         3360          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
3.18%         2777          java.lang.Math#round()
3.01%         2624          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
3.01%         2624          org.apache.lucene.search.DisiPriorityQueue#top()
2.33%         2036          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.20%         1917          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.07%         1809          org.apache.lucene.util.SmallFloat#longToInt4()
1.81%         1580          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.78%         1555          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
1.77%         1545          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.51%         1316          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.39%         1210          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.29%         1124          org.apache.lucene.search.ImpactsDISI#docID()
1.22%         1063          jdk.internal.misc.Unsafe#getByte()
1.19%         1037          org.apache.lucene.store.ByteBufferGuard#getByte()
1.08%         938           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.97%         849           org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.86%         750           org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo()
0.82%         717           org.apache.lucene.util.SmallFloat#intToByte4()
0.78%         683           java.lang.Math#toIntExact()
0.62%         540           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.61%         529           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq()
0.51%         441           org.apache.lucene.search.ImpactsDISI#nextDoc()
0.50%         435           org.apache.lucene.search.ImpactsDISI#advanceTarget()

Baseline CPU JFR

PERCENT       CPU SAMPLES   STACK
16.38%        13363         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
8.67%         7073          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
8.05%         6569          org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc()
6.50%         5305          org.apache.lucene.search.DisiPriorityQueue#downHeap()
5.17%         4220          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
4.40%         3587          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
4.30%         3505          org.apache.lucene.search.DisiPriorityQueue#topList()
3.37%         2751          java.lang.Math#round()
2.77%         2263          org.apache.lucene.search.DisiPriorityQueue#top()
2.64%         2156          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
2.62%         2135          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.12%         1729          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.11%         1723          org.apache.lucene.util.SmallFloat#longToInt4()
2.06%         1678          org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
2.01%         1640          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.77%         1442          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.62%         1325          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.26%         1028          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
1.10%         899           java.lang.Math#toIntExact()
1.10%         898           org.apache.lucene.util.SmallFloat#intToByte4()
1.09%         888           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq()
1.06%         864           org.apache.lucene.store.ByteBufferGuard#getByte()
0.80%         652           jdk.internal.misc.Unsafe#getByte()
0.74%         605           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.68%         552           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.60%         492           org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum()
0.55%         451           java.io.RandomAccessFile#readBytes()
0.38%         314           org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.38%         307           org.apache.lucene.codecs.lucene90.ForUtil#expand8()
0.32%         260           org.apache.lucene.codecs.lucene90.PForUtil#expand32()

As they look very similar, based on my previous debugging, it may suggest the max score being computed is high and most of the docs are not being skipped, hence the candidate implementation ended up examining the same amount of docs with baseline.

sonatype-lift · 2021-12-02T11:07:51Z

lucene/core/src/java/org/apache/lucene/search/ImpactsMergingUtils.java

+   * Merge impacts from multiple impactsEnum (terms matches) within the same field. The high level
+   * logic is to combine freqs that have the same norm from impacts.
+   */
+  public static List<Impact> mergeImpactsPerField(


MixedMutabilityReturnType: This method returns both mutable and immutable collections or maps from different paths. This may be confusing for users of the method. (details)
(at-me in a reply with help or ignore)

zacharymorn · 2021-12-03T05:31:15Z

I just added two logging statements in ImpactsDISI to print out upTo and max score comparison to examine the docs pruning results:

diff --git a/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java b/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
index 843bb4ad57b..3930ae2f50d 100644
--- a/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
+++ b/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
@@ -110,8 +110,10 @@ public final class ImpactsDISI extends DocIdSetIterator {
       assert upTo >= target;
 
       if (maxScore >= minCompetitiveScore) {
+        System.out.println("Keeping - target: " + target + " upTo: " + upTo + " minCompScore: " + minCompetitiveScore + " maxScore: " + maxScore);
         return target;
       }
+      System.out.println("Skipping - target: " + target + " upTo: " + upTo + " minCompScore: " + minCompetitiveScore + " maxScore: " + maxScore);
 
       if (upTo == NO_MORE_DOCS) {
         return NO_MORE_DOCS;

and the following query was used:

CombinedFieldQuery query =
            new CombinedFieldQuery.Builder()
                    .addField("titleTokenized", (float) 20.0)
                    .addField("body", (float) 1.0)
                    .addTerm(new BytesRef("three"))
                    .addTerm(new BytesRef("geronimo"))
                    .build();

For the implementation that uses mostly weighted field and lowest Impacts#getDocIdUpTo to drive scoring, upTo and maxScore were high, hence there was no pruning.

  1> Keeping - target: 152 upTo: 232464 minCompScore: 0.17629458 maxScore: 2.1560352
  1> Keeping - target: 159 upTo: 232464 minCompScore: 0.18669213 maxScore: 2.1560352
  1> Keeping - target: 160 upTo: 232464 minCompScore: 0.20146967 maxScore: 2.1560352
  1> Keeping - target: 161 upTo: 232464 minCompScore: 0.21165837 maxScore: 2.1560352
  1> Keeping - target: 167 upTo: 232464 minCompScore: 0.21696068 maxScore: 2.1560352
  1> Keeping - target: 170 upTo: 232464 minCompScore: 0.23223151 maxScore: 2.1560352
  1> Keeping - target: 171 upTo: 232464 minCompScore: 0.26476994 maxScore: 2.1560352
  1> Keeping - target: 172 upTo: 232464 minCompScore: 0.27027848 maxScore: 2.1560352
  1> Keeping - target: 174 upTo: 232464 minCompScore: 0.27216592 maxScore: 2.1560352
  1> Keeping - target: 175 upTo: 232464 minCompScore: 0.27311972 maxScore: 2.1560352
  1> Keeping - target: 176 upTo: 232464 minCompScore: 0.28871515 maxScore: 2.1560352
  1> Keeping - target: 179 upTo: 232464 minCompScore: 0.2955102 maxScore: 2.1560352
  1> Keeping - target: 185 upTo: 232464 minCompScore: 0.31796065 maxScore: 2.1560352
  1> Keeping - target: 187 upTo: 232464 minCompScore: 0.32868698 maxScore: 2.1560352
  1> Keeping - target: 189 upTo: 232464 minCompScore: 0.33091953 maxScore: 2.1560352
  1> Keeping - target: 190 upTo: 232464 minCompScore: 0.353465 maxScore: 2.1560352
  1> Keeping - target: 191 upTo: 232464 minCompScore: 0.36112097 maxScore: 2.1560352
  1> Keeping - target: 194 upTo: 232464 minCompScore: 0.3743663 maxScore: 2.1560352
  1> Keeping - target: 195 upTo: 232464 minCompScore: 0.38150516 maxScore: 2.1560352
  1> Keeping - target: 197 upTo: 232464 minCompScore: 0.39127985 maxScore: 2.1560352
  1> Keeping - target: 198 upTo: 232464 minCompScore: 0.39738885 maxScore: 2.1560352
  1> Keeping - target: 208 upTo: 232464 minCompScore: 0.39789525 maxScore: 2.1560352
  1> Keeping - target: 216 upTo: 232464 minCompScore: 0.41046718 maxScore: 2.1560352
  1> Keeping - target: 219 upTo: 232464 minCompScore: 0.4148362 maxScore: 2.1560352
  1> Keeping - target: 224 upTo: 232464 minCompScore: 0.4204301 maxScore: 2.1560352
  1> Keeping - target: 227 upTo: 232464 minCompScore: 0.43892494 maxScore: 2.1560352
  1> Keeping - target: 234 upTo: 232464 minCompScore: 0.4551008 maxScore: 2.1560352
  1> Keeping - target: 235 upTo: 232464 minCompScore: 0.47162262 maxScore: 2.1560352
  1> Keeping - target: 238 upTo: 232464 minCompScore: 0.47382453 maxScore: 2.1560352
  1> Keeping - target: 239 upTo: 232464 minCompScore: 0.48034182 maxScore: 2.1560352
  1> Keeping - target: 243 upTo: 232464 minCompScore: 0.48332027 maxScore: 2.1560352
  1> Keeping - target: 245 upTo: 232464 minCompScore: 0.4903498 maxScore: 2.1560352
  1> Keeping - target: 246 upTo: 232464 minCompScore: 0.5026699 maxScore: 2.1560352
...

For the implementation that uses mostly weighted field and lowest cost / doc freq to drive scoring, upTo turned out to be the integer value of no more docs for that query, and there was no pruning either.

  1> Keeping - target: 151 upTo: 2147483647 minCompScore: 0.15862586 maxScore: 2.1560352
  1> Keeping - target: 152 upTo: 2147483647 minCompScore: 0.17629458 maxScore: 2.1560352
  1> Keeping - target: 159 upTo: 2147483647 minCompScore: 0.18669213 maxScore: 2.1560352
  1> Keeping - target: 160 upTo: 2147483647 minCompScore: 0.20146967 maxScore: 2.1560352
  1> Keeping - target: 161 upTo: 2147483647 minCompScore: 0.21165837 maxScore: 2.1560352
  1> Keeping - target: 167 upTo: 2147483647 minCompScore: 0.21696068 maxScore: 2.1560352
  1> Keeping - target: 170 upTo: 2147483647 minCompScore: 0.23223151 maxScore: 2.1560352
  1> Keeping - target: 171 upTo: 2147483647 minCompScore: 0.26476994 maxScore: 2.1560352
  1> Keeping - target: 172 upTo: 2147483647 minCompScore: 0.27027848 maxScore: 2.1560352
  1> Keeping - target: 174 upTo: 2147483647 minCompScore: 0.27216592 maxScore: 2.1560352
  1> Keeping - target: 175 upTo: 2147483647 minCompScore: 0.27311972 maxScore: 2.1560352
  1> Keeping - target: 176 upTo: 2147483647 minCompScore: 0.28871515 maxScore: 2.1560352
  1> Keeping - target: 179 upTo: 2147483647 minCompScore: 0.2955102 maxScore: 2.1560352
  1> Keeping - target: 185 upTo: 2147483647 minCompScore: 0.31796065 maxScore: 2.1560352
  1> Keeping - target: 187 upTo: 2147483647 minCompScore: 0.32868698 maxScore: 2.1560352
  1> Keeping - target: 189 upTo: 2147483647 minCompScore: 0.33091953 maxScore: 2.1560352
  1> Keeping - target: 190 upTo: 2147483647 minCompScore: 0.353465 maxScore: 2.1560352
  1> Keeping - target: 191 upTo: 2147483647 minCompScore: 0.36112097 maxScore: 2.1560352
  1> Keeping - target: 194 upTo: 2147483647 minCompScore: 0.3743663 maxScore: 2.1560352
  1> Keeping - target: 195 upTo: 2147483647 minCompScore: 0.38150516 maxScore: 2.1560352
  1> Keeping - target: 197 upTo: 2147483647 minCompScore: 0.39127985 maxScore: 2.1560352
  1> Keeping - target: 198 upTo: 2147483647 minCompScore: 0.39738885 maxScore: 2.1560352
  1> Keeping - target: 208 upTo: 2147483647 minCompScore: 0.39789525 maxScore: 2.1560352
  1> Keeping - target: 216 upTo: 2147483647 minCompScore: 0.41046718 maxScore: 2.1560352
  1> Keeping - target: 219 upTo: 2147483647 minCompScore: 0.4148362 maxScore: 2.1560352
  1> Keeping - target: 224 upTo: 2147483647 minCompScore: 0.4204301 maxScore: 2.1560352
  1> Keeping - target: 227 upTo: 2147483647 minCompScore: 0.43892494 maxScore: 2.1560352
  1> Keeping - target: 234 upTo: 2147483647 minCompScore: 0.4551008 maxScore: 2.1560352
  1> Keeping - target: 235 upTo: 2147483647 minCompScore: 0.47162262 maxScore: 2.1560352
  1> Keeping - target: 238 upTo: 2147483647 minCompScore: 0.47382453 maxScore: 2.1560352
  1> Keeping - target: 239 upTo: 2147483647 minCompScore: 0.48034182 maxScore: 2.1560352
  1> Keeping - target: 243 upTo: 2147483647 minCompScore: 0.48332027 maxScore: 2.1560352
  1> Keeping - target: 245 upTo: 2147483647 minCompScore: 0.4903498 maxScore: 2.1560352
  1> Keeping - target: 246 upTo: 2147483647 minCompScore: 0.5026699 maxScore: 2.1560352
  1> Keeping - target: 247 upTo: 2147483647 minCompScore: 0.50923806 maxScore: 2.1560352
  1> Keeping - target: 249 upTo: 2147483647 minCompScore: 0.50958425 maxScore: 2.1560352
  1> Keeping - target: 259 upTo: 2147483647 minCompScore: 0.51977855 maxScore: 2.1560352
  1> Keeping - target: 265 upTo: 2147483647 minCompScore: 0.5250303 maxScore: 2.1560352
  1> Keeping - target: 266 upTo: 2147483647 minCompScore: 0.53061455 maxScore: 2.1560352
  1> Keeping - target: 267 upTo: 2147483647 minCompScore: 0.53076476 maxScore: 2.1560352
  1> Keeping - target: 269 upTo: 2147483647 minCompScore: 0.53151745 maxScore: 2.1560352
  1> Keeping - target: 275 upTo: 2147483647 minCompScore: 0.56218606 maxScore: 2.1560352
  1> Keeping - target: 279 upTo: 2147483647 minCompScore: 0.57041436 maxScore: 2.1560352
  1> Keeping - target: 280 upTo: 2147483647 minCompScore: 0.58023125 maxScore: 2.1560352
  1> Keeping - target: 296 upTo: 2147483647 minCompScore: 0.592781 maxScore: 2.1560352
  1> Keeping - target: 298 upTo: 2147483647 minCompScore: 0.5975618 maxScore: 2.1560352
  1> Keeping - target: 314 upTo: 2147483647 minCompScore: 0.5996211 maxScore: 2.1560352
  1> Keeping - target: 318 upTo: 2147483647 minCompScore: 0.6027052 maxScore: 2.1560352
  1> Keeping - target: 319 upTo: 2147483647 minCompScore: 0.6080974 maxScore: 2.1560352
  1> Keeping - target: 329 upTo: 2147483647 minCompScore: 0.6116286 maxScore: 2.1560352
  1> Keeping - target: 335 upTo: 2147483647 minCompScore: 0.6127903 maxScore: 2.1560352
  1> Keeping - target: 336 upTo: 2147483647 minCompScore: 0.6186264 maxScore: 2.1560352
  1> Keeping - target: 339 upTo: 2147483647 minCompScore: 0.62189597 maxScore: 2.1560352
  1> Keeping - target: 342 upTo: 2147483647 minCompScore: 0.63768846 maxScore: 2.1560352
  1> Keeping - target: 346 upTo: 2147483647 minCompScore: 0.63877517 maxScore: 2.1560352
  1> Keeping - target: 352 upTo: 2147483647 minCompScore: 0.66235477 maxScore: 2.1560352
  1> Keeping - target: 354 upTo: 2147483647 minCompScore: 0.6642242 maxScore: 2.1560352
  1> Keeping - target: 355 upTo: 2147483647 minCompScore: 0.6710866 maxScore: 2.1560352
  1> Keeping - target: 358 upTo: 2147483647 minCompScore: 0.6718084 maxScore: 2.1560352
  1> Keeping - target: 362 upTo: 2147483647 minCompScore: 0.67271274 maxScore: 2.1560352
  1> Keeping - target: 365 upTo: 2147483647 minCompScore: 0.6831766 maxScore: 2.1560352
  1> Keeping - target: 367 upTo: 2147483647 minCompScore: 0.683592 maxScore: 2.1560352
  1> Keeping - target: 371 upTo: 2147483647 minCompScore: 0.6896402 maxScore: 2.1560352
...

For the implementation that uses lowest Impacts#getDocIdUpTo across all fields, we would get lower upTo and max score in general, and hence some docs do get pruned.

  1> Keeping - target: 152 upTo: 195 minCompScore: 0.17629458 maxScore: 1.3767262
  1> Keeping - target: 159 upTo: 195 minCompScore: 0.18669213 maxScore: 1.3767262
...
  1> Keeping - target: 219 upTo: 436 minCompScore: 0.4148362 maxScore: 1.7731527
  1> Keeping - target: 224 upTo: 436 minCompScore: 0.4204301 maxScore: 1.7731527
...
  1> Keeping - target: 764 upTo: 902 minCompScore: 0.9138265 maxScore: 1.65851
  1> Keeping - target: 767 upTo: 902 minCompScore: 0.91438395 maxScore: 1.65851
  1> Keeping - target: 779 upTo: 902 minCompScore: 0.91606015 maxScore: 1.65851
...
  1> Keeping - target: 1278 upTo: 1384 minCompScore: 1.0459963 maxScore: 1.5856887
  1> Keeping - target: 1280 upTo: 1384 minCompScore: 1.0520688 maxScore: 1.5856887
...
  1> Keeping - target: 7144 upTo: 7227 minCompScore: 1.428069 maxScore: 1.7997875
  1> Keeping - target: 7228 upTo: 7453 minCompScore: 1.4284092 maxScore: 1.4454873
  1> Skipping - target: 7454 upTo: 7702 minCompScore: 1.4284092 maxScore: 1.3662372
  1> Keeping - target: 7703 upTo: 7921 minCompScore: 1.4284092 maxScore: 1.6103871
  1> Keeping - target: 7922 upTo: 8131 minCompScore: 1.4284092 maxScore: 1.5153265
...
  1> Keeping - target: 17039 upTo: 17327 minCompScore: 1.5600302 maxScore: 1.6949782
  1> Keeping - target: 17328 upTo: 17873 minCompScore: 1.5600302 maxScore: 1.8440268
  1> Keeping - target: 17874 upTo: 18438 minCompScore: 1.5600302 maxScore: 1.8541759
  1> Skipping - target: 18439 upTo: 18774 minCompScore: 1.5600302 maxScore: 1.5445734
  1> Keeping - target: 18775 upTo: 19012 minCompScore: 1.5600302 maxScore: 2.1527436
  1> Keeping - target: 19001 upTo: 19012 minCompScore: 1.5605642 maxScore: 2.1527436
...
  1> Keeping - target: 19408 upTo: 19541 minCompScore: 1.5636915 maxScore: 1.927903
  1> Skipping - target: 19542 upTo: 19834 minCompScore: 1.5636915 maxScore: 1.5145609
  1> Keeping - target: 19835 upTo: 20077 minCompScore: 1.5636915 maxScore: 1.699799
  1> Keeping - target: 20066 upTo: 20077 minCompScore: 1.5694537 maxScore: 1.699799
...
  1> Keeping - target: 27159 upTo: 27396 minCompScore: 1.6439995 maxScore: 2.1519704
  1> Keeping - target: 27182 upTo: 27396 minCompScore: 1.6449014 maxScore: 2.1519704
  1> Keeping - target: 27397 upTo: 27627 minCompScore: 1.6449014 maxScore: 2.1505308
  1> Keeping - target: 27500 upTo: 27627 minCompScore: 1.6467081 maxScore: 2.1505308
  1> Skipping - target: 27628 upTo: 27859 minCompScore: 1.6467081 maxScore: 1.5844319
  1> Keeping - target: 27860 upTo: 28118 minCompScore: 1.6467081 maxScore: 1.7705853
  1> Keeping - target: 27976 upTo: 28118 minCompScore: 1.6487088 maxScore: 1.7705853
  1> Skipping - target: 28119 upTo: 28370 minCompScore: 1.6487088 maxScore: 1.576628
  1> Keeping - target: 28371 upTo: 28619 minCompScore: 1.6487088 maxScore: 2.0417855
  1> Keeping - target: 28620 upTo: 28954 minCompScore: 1.6487088 maxScore: 2.1356478

jpountz · 2021-12-03T16:04:08Z

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

zacharymorn · 2021-12-08T06:54:06Z

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

No problem! It's actually quite enjoyable for me to explore different approaches and compare their performance characteristics. I may also miss certain aspects as well, so this discussion helps to verify my understanding also! I've re-implemented the other approach in cbbd28b.

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

Yeah I did give that a try earlier, but I ended up with the utility methods approach as I saw some potential issue. I implemented it again in 502d44e.

If my understanding was correct, the intention of using higher-level abstraction there was to reduce code duplication by using a loop over Map<Field, SynonymImpactsSource> or even Map<Field, SynonymImpacts> in the numLevels, getDocIdUpTo and (in particular) getImpacts implementations in CombinedFieldsQuery? However, I noticed that for getImpacts, it doesn't seems to be easily decomposable / delegate-able to getImpacts of SynonymImpactsSource or SynonymImpacts. As that method takes in a level and internally looks up a docIdUpTo of the field to get a list of impacts with that boundary, for CombinedFieldsQuery we actually need a docIdUpTo that would be valid across all fields, hence it couldn't be abstracted away in the loop 502d44e#diff-62e151fca32e7560ef90ee8eab1adaffac3073f23701aaf0c17a02e250727ed3R547-R550 . Given this restriction, I feel using utility methods may work better in removing duplication in code?

zacharymorn · 2021-12-19T20:59:32Z

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

No problem! It's actually quite enjoyable for me to explore different approaches and compare their performance characteristics. I may also miss certain aspects as well, so this discussion helps to verify my understanding also! I've re-implemented the other approach in cbbd28b.

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

Yeah I did give that a try earlier, but I ended up with the utility methods approach as I saw some potential issue. I implemented it again in 502d44e.

If my understanding was correct, the intention of using higher-level abstraction there was to reduce code duplication by using a loop over Map<Field, SynonymImpactsSource> or even Map<Field, SynonymImpacts> in the numLevels, getDocIdUpTo and (in particular) getImpacts implementations in CombinedFieldsQuery? However, I noticed that for getImpacts, it doesn't seems to be easily decomposable / delegate-able to getImpacts of SynonymImpactsSource or SynonymImpacts. As that method takes in a level and internally looks up a docIdUpTo of the field to get a list of impacts with that boundary, for CombinedFieldsQuery we actually need a docIdUpTo that would be valid across all fields, hence it couldn't be abstracted away in the loop 502d44e#diff-62e151fca32e7560ef90ee8eab1adaffac3073f23701aaf0c17a02e250727ed3R547-R550 . Given this restriction, I feel using utility methods may work better in removing duplication in code?

Hi @jpountz , just want to circle back to this discussion and see if you may have further suggestions to the above analysis?

zacharymorn · 2022-02-01T06:17:40Z

Hi @jpountz , just want to check again to see if I have addressed your concern with the utility approach?

To better illustrate the potential issue I'm considering with the higher level abstraction approach, I've copied over some key code snippets here:

If we were to extract out SynonymImpactsSource (from SynonymQuery), it may look something like this:

public class SynonymImpactsSource implements ImpactsSource {

  private final ImpactsEnum[] impactsEnums;
  private final Impacts[] impacts;
  private final float[] boosts;
  private Impacts lead;

  public SynonymImpactsSource(ImpactsEnum[] impactsEnums, float[] boosts) {
    ...
  }

  @Override
  public Impacts getImpacts() throws IOException {
    // Use the impacts that have the lower next boundary as a lead.
    // It will decide on the number of levels and the block boundaries.
    if (lead == null) {
      Impacts tmpLead = null;
      for (int i = 0; i < impactsEnums.length; ++i) {
        impacts[i] = impactsEnums[i].getImpacts();
        if (tmpLead == null || impacts[i].getDocIdUpTo(0) < tmpLead.getDocIdUpTo(0)) {
          tmpLead = impacts[i];
        }
      }
      lead = tmpLead;
    }
    return new Impacts() {

      @Override
      public int numLevels() {
        // Delegate to the lead
        return lead.numLevels();
      }

      @Override
      public int getDocIdUpTo(int level) {
        // Delegate to the lead
        return lead.getDocIdUpTo(level);
      }

      @Override
      public List<Impact> getImpacts(int level) {
        final int docIdUpTo = getDocIdUpTo(level); 
        return ImpactsMergingUtils.mergeImpactsPerField(
            impactsEnums, impacts, boosts, docIdUpTo, false);
      }
    };
  }

  @Override
  public void advanceShallow(int target) throws IOException {
    for (ImpactsEnum impactsEnum : impactsEnums) {
      if (impactsEnum.docID() < target) {
        impactsEnum.advanceShallow(target);
      }
    }
  }

  public Impacts[] impacts() {
    return impacts;
  }
}

However, the above SynonymImpactsSource#getImpacts(level) may not be directly usable for CombinedFieldsQuery, as illustrated below

Impacts CombinedFieldsQuery$ImpactsSource#getImpacts {
   for (entry : Map<Field, SynonymImpactsSource>) {
        // this is potentially problematic, as this methods only takes in level info, and it internally looks up a field-specific docIdUpTo to get valid impacts (see definition above). However, for CombinedFieldQuery, docIdUpTo may be different among different fields with same level 
        combine SynonymImpactsSource.getImpacts(level) 
   }
  return combined Impacts
}

Hence we may actually need to change some APIs if we were to make SynonymImpactsSource to work in CombinedFieldQuery use case ?

github-actions · 2024-01-09T00:15:34Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

zacharymorn requested a review from jpountz October 29, 2021 05:55

zacharymorn added 2 commits November 4, 2021 21:12

Fix style

8ecdc69

Update test

b4ac2b4

zacharymorn changed the title ~~LUCENE-10061: [WIP] Implements basic dynamic pruning support for CombinedFieldsQuery~~ LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery Nov 6, 2021

zacharymorn added 2 commits November 6, 2021 11:40

Update amplifiedMaxFreq calculation

4244f39

zacharymorn mentioned this pull request Nov 16, 2021

LUCENE-10236: Updated field-weight used in CombinedFieldQuery scoring calculation, and added a test #444

Merged

6 tasks

jpountz reviewed Nov 16, 2021

View reviewed changes

zacharymorn added 4 commits November 18, 2021 23:48

Merge branch 'main' into LUCENE-10061-CombinedFieldsQuery-Pruning-Sup…

68eb9b8

…port

cache leadingImpactsPerField

6d5e780

test leadImpact from highest weighted field

db2446f

zacharymorn requested a review from jpountz November 29, 2021 04:57

jpountz reviewed Nov 30, 2021

View reviewed changes

zacharymorn added 4 commits December 1, 2021 10:35

Address feedback - avoid repeated computation for max weighted field,…

8a7ea99

… replace doc freq collection with ImpactsEnum#cost

Refactoring - extract out logic that merges impacts within the same f…

0a9bdcc

…ield

sonatype-lift bot reviewed Dec 2, 2021

View reviewed changes

zacharymorn added 2 commits December 7, 2021 22:03

Use leading impacts across fields

cbbd28b

Use higher abstraction SynonymImpactsSource in CombinedFieldQuery

502d44e

zacharymorn requested a review from jpountz December 8, 2021 07:02

asfimport mentioned this pull request Feb 1, 2022

CombinedFieldsQuery needs dynamic pruning support [LUCENE-10061] #11099

Open

github-actions bot added the Stale label Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery #418

LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery #418

zacharymorn commented Oct 29, 2021 •

edited by mocobeta

Loading

zacharymorn commented Nov 4, 2021

zacharymorn commented Nov 5, 2021

zacharymorn commented Nov 12, 2021

zacharymorn commented Nov 13, 2021

zacharymorn commented Nov 16, 2021

jpountz Nov 16, 2021

zacharymorn Nov 17, 2021

zacharymorn Nov 19, 2021

jpountz Nov 19, 2021

zacharymorn Nov 22, 2021

zacharymorn Nov 30, 2021

jpountz left a comment

jpountz Nov 30, 2021

zacharymorn Dec 1, 2021

jpountz Nov 30, 2021

zacharymorn commented Dec 1, 2021

jpountz commented Dec 1, 2021

zacharymorn commented Dec 2, 2021

zacharymorn commented Dec 2, 2021

sonatype-lift bot Dec 2, 2021

zacharymorn commented Dec 3, 2021

jpountz commented Dec 3, 2021

zacharymorn commented Dec 8, 2021

zacharymorn commented Dec 19, 2021

zacharymorn commented Feb 1, 2022

github-actions bot commented Jan 9, 2024

LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery #418

Are you sure you want to change the base?

LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery #418

Conversation

zacharymorn commented Oct 29, 2021 • edited by mocobeta Loading

Description

Tests

zacharymorn commented Nov 4, 2021

zacharymorn commented Nov 5, 2021

zacharymorn commented Nov 12, 2021

zacharymorn commented Nov 13, 2021

zacharymorn commented Nov 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zacharymorn commented Dec 1, 2021

jpountz commented Dec 1, 2021

zacharymorn commented Dec 2, 2021

zacharymorn commented Dec 2, 2021

sonatype-lift bot Dec 2, 2021

Choose a reason for hiding this comment

zacharymorn commented Dec 3, 2021

jpountz commented Dec 3, 2021

zacharymorn commented Dec 8, 2021

zacharymorn commented Dec 19, 2021

zacharymorn commented Feb 1, 2022

github-actions bot commented Jan 9, 2024

zacharymorn commented Oct 29, 2021 •

edited by mocobeta

Loading